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Abstract 

Text classification is a widely studied problem, and it can 
be considered solved for some domains and under certain 
circumstances. There are scenarios, however, that have re¬ 
ceived little or no attention at all, despite its relevance and 
applicability. One of such scenarios is early text classifica¬ 
tion, where one needs to know the category of a document 
by using partial information only. A document is processed 
as a sequence of terms, and the goal is to devise a method 
that can make predictions as fast as possible. The impor¬ 
tance of this variant of the text classification problem is ev¬ 
ident in domains like sexual predator detection, where one 
wants to identify an offender as early as possible. This paper 
analyzes the suitability of the standard naive Bayes classi¬ 
fier for approaching this problem. Specifically, we assess 
its performance when classifying documents after seeing an 
increasingly number of terms. A simple modification to the 
standard naive Bayes implementation allows us to make pre¬ 
dictions with partial information. To the best of our knowl¬ 
edge Naive Bayes has not been used for this purpose before. 
Throughout an extensive experimental evaluation we show 
the effectiveness of the classifier for early text classification. 
What is more, we show that this simple solution is very com¬ 
petitive when compared with state of the art methodologies 
that are more elaborated. We foresee our work will pave the 
way for the development of more effective early text classi¬ 
fication techniques based in the naive Bayes formulation. 

Keywords: Early text classification; sequential 
text classification; naive Bayes; classification with par¬ 
tial information. 

1 Introduction 

Text classification is the task of assigning documents 
to its correct categories [H]. This is one of the 
most studied topics within natural language processing. 
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Advances in the last two decades have made significant 
progress and nowadays the text classification problem 
is considered to be solved in some scenarios and under 
certain circumstances (e.g., news classification with 
plenty of data). There are, however, settings of the text 
classification problem that have received little attention 
despite the wide applicability they may have. One of 
such scenarios is that of early text classification, which 
deals with the development of predictive models that are 
capable of determining the class a document belongs to 
as soon as possible. A text is assumed to be processed 
sequentially, starting at the beginning of the document 
and reading input words one by one. It is desired to 
make predictions with as low information as possible. 

The early text classification topic has received lit¬ 
tle attention in the community, and there exist only a 
few works that have approached similar scenarios [4] 
(please note that in this work the problem is not stated 
as one of early recognition). Despite its low popularity, 
this topic has a major potential in practical applica¬ 
tions. For instance, consider the problem of detecting 
sexual predators in chat conversations. Here, the goal is 
to sequentially read a conversation and to determine as 
fast as possible whenever a sexual predator is involved; 
clearly, a detection using the whole conversation can 
only be used for forensics rather than for prevention. 
Other sample applications include, any kind of conver¬ 
sation analysis that requires of a fast response, (e.g., 
cyber-bullying prevention, adaptive/intelligent answer¬ 
ing systems); trending-topic discovery (e.g., analyzing 
comments on social networks and determining as soon 
as possible whenever a topic will become a trend); con¬ 
tent filtering (e.g., filtering inappropriate/ilegal content 
in local networks), author profiling (e.g., knowing the 
age, gender or interest of a person by using as few writ¬ 
ten information as possible) etcetera. 

This paper explores the suitability of one of the 
most popular methods for text classification, i.e., naive 
Bayes mill], to approach the early-classification set¬ 
ting: early naive Bayes. Specifically, we evaluate the 
capabilities of this classifier to make predictions when 
seeing an increasing number of terms from documents. 
A simple modification to the standard naive Bayes im¬ 
plementation allows us to make predictions with partial 
information. Despite its simplicity, the proposed exten- 



sion obtains competitive performance in standard text 
classification tasks and in sexual predator detection. In 
fact we show that the proposed modification compares 
favorably with the only existing work that addresses a 
similar task. Hopefully, our work will motivate research 
on further extensions to this classifier for early text clas¬ 
sification. 

The remainder of this paper is organized as fol¬ 
lows. Next section reviews related work on early text 
classification and on extensions to naive Bayes to face 
closely related problems. Then, Section [3] describes 
naive Bayes classifier and the modification we propose 
to make early predictions. Section 0] reports experimen¬ 
tal results that show the effectiveness of the proposal. 
SectionOpresents conclusions and discusses future work 
directions. 

2 Related work 

This section reviews related work on both: early text 
classification and extensions to naive Bayes to face 
similar problems. 

2.1 Early text classification To the best of our 
knowledge, the early text categorization problem has 
been approached only in [4]; although the authors’ 
main focus was not on making predictions earlier but 
on improving the classification performance with a 
sequential reading approach. In that work, the authors 
process documents in a sentence-level basis. Every 
time t, the authors read a sentence and attempt to 
determine the class of the document, where multi¬ 
label classification is allowed. They proposed a Markov 
decision process (MDP) to approach the problem, where 
two possible actions were allowed: read next sentence, 
or classify. Each sentence has to be represented by 
its tfidf representation and a classifier is trained to 
learn good/bad state-action pairs (10,000 examples were 
randomly generated) on a high-dimensional space. 

The performance of their method was evaluated 
in standard text classification data sets. Although 
the performance of such method is competitive (it was 
compared to a SVM classifier), it remains unknown 
whether a much more simpler approach would be as 
effective as the complex procedure in [1]. In Section!?] 
we compare the proposed extension of naive Bayes with 
the previous work. We show our proposal is competitive 
in terms of performance, but also has the following 
advantages: it is scalable in the number of categories 
(the MDP evaluated every possible state after reading 
each sentence, ours simply adds probabilities); it is 
able to make predictions with as low information as 
no-word (using priors-only information, but the most 
important aspect is that it can make predictions at 


anytime); it process documents in a word-level basis 
(i.e., one word added at a time, while the MDP requires 
processing whole sentences); training is much more 
efficient (same training complexity as an standard naive 
Bayes classifier, the MDP requires of high-complexity 
training procedures) and the resultant model is way 
more simple. 

Although the early text classification problem has 
not been studied elsewhere, it is worth mentioning 
works that have approached related tasks. In |3], 
the authors propose a hidden Markov model (HMM) 
to classify passages within documents. The task is 
information retrieval and a document is considered as 
relevant or irrelevant (i.e. two classes) to a given 
category/query. The document is decomposed into 
passages, each of which is considered by the HMM as 
relevant or irrelevant to the classification. No attempt 
is made to perform classification early, although it is 
interesting that the proposed model is a generalization 
of the multinomial naive Bayes we consider in this work 
(again, for the two-class whole-document classification 
problem). 

In [5] the authors extend the MDP proposed for 
sequential text classification to deal with any other type 
of data. The formulation is almost the same as in [4], 
although this time the MDP can decide what feature to 
sample from the instance under analysis (i.e., there is no 
sequential input). Furthermore, the MDP is equipped 
with a mechanism that aims to minimize the number of 
features to use for classification. Clearly, this extended 
MDP is not applicable to the early text classification 
domain (words cannot be chosen from documents, they 
appear sequentially). 

Summarizing, it is remarkable the little attention 
that early text classification has received so far, this 
may be due to the fact that not so many applications in 
the past required to cope with this problem. Nowadays, 
however, the online status of the world population, 
requires of technology that can anticipate the prediction 
of certain events with the goal of preventing undesired 
effects or, on the other hand, to act as fast as possible 
to take the leadership on information technology. 

2.2 Extending naive Bayes Naive Bayes has been 
used extensively in text mining and within machine 
learning in general, because of its high performance in 
several domains, several modifications and extensions 
have been proposed to augment the scope of the classi¬ 
fier. Related to our work, the following extensions have 
been reported in the literature: 

• Alleviating independence assumption of 

Naive Bayes. This is perhaps the most studied 

topic in terms of extending the mentioned classifier. 


The independence assumption may be too strong 
for some domains/applications, therefore, several 
works have been proposed that try to relax it. Most 
notably TAN [B], AODE [T7], and WANBIA [ID] 
extensions have reported outstanding results. Nev¬ 
ertheless, the focus here is on relaxing the attribute 
independence assumption, and not on working with 
partial information. One should note, however, 
that this extended versions of naive Bayes can be 
well suited for early text classification, as attribute- 
dependency information can help the algorithm to 
classify texts earlier. 

Anytime naiVe Bayes. The goal of this type 
of extensions is to provide naive Bayes with mech¬ 
anisms that allow it to make predictions at any¬ 
time [Mil]- This means that the algorithm has to 
be ready to provide a prediction under time con¬ 
straints: the classifier can spent increasing amounts 
of time for doing inference, but it must provide an 
answer when requested; usually accuracy increases 
as more time is allowed. This type of methods is 
related to our proposal in that the system has to 
be ready to make predictions at anytime, however, 
the granularity of information processing is differ¬ 
ent: in anytime classification a whole instance is 
seen, whereas in early text classification, part of an 
instance is available. 

Incremental naive Bayes. Refers to developing 
learning and inference mechanisms to allow the 
classifier be trained in an online learning setting [H 
[12] . That is, reading a sample (or batch of samples 
at a time), the model makes predictions for the 
incoming samples and then it is provided with the 
correct labels, next, model parameters have to be 
updated accordingly. This type of methods are 
related to our proposal in that partial information 
is processed incrementally, although one should 
note that information units are instances and not 
words/attributes. 

Naive Bayes for incomplete information. 

These extensions aim at helping naive Bayes to 
deal with missing information, usually, at the at¬ 
tribute level. For instance by equipping the clas¬ 
sifiers with mechanisms to work under highly- 
sparse representations (e.g., in short text catego¬ 
rization) [TDJ m [7] [ID] . These methods are mostly 
based on smoothing attribute-class probabilities 
and often use co-occurrence statistics. Although 
not dealing with early text classification, this type 
of methods are relevant because smoothing plays 
a key role when working with partial information 
(everything not seen so far has to be smoothed). 


Summarizing, there have been many attempts to 
improve and extend naive Bayes to be robust against 
several limitations, however, to the best of our knowl¬ 
edge, it has not been used for early text classification be¬ 
fore. This is somewhat surprising given that, as shown 
in the next section, the naive Bayes classifiers can nat¬ 
urally deal with partial information. 

3 Early text classification with NaiVe Bayes 

This section describes the way we use naive Bayes 
classifier for early text classification. 

3.1 Naive Bayes classifier We first describe the 
standard naive Bayes classifier. Consider a data set: 
V = (xi, yi)|i 7 v} with N pairs of instances (x^) 
and labels {jji) associated to a supervised classification 
problem. Assuming that x^ S and yi € C = 
{!,...,AT} we have a AT—class classification problem 
with numeric^ attributes. 

Under the naive Bayes classifier, the class for an 
unseen instance x^ = , XT,q) is given by: 

C = argnmxP(C'i|xT) 

From Bayes’ theorem it follows that the posterior 
probability above can be estimated as: 

p(xTia)p(a) 

P(xt) 

The denominator can be removed from Equation 
as it does not affect the decision: 


The assumption of naive Bayes is that the proba¬ 
bility of occurrence of attributes of xt is independent 
given its class, that is: 


The maximum likelihood estimation for the prior of 
class Ci is given by: 


where Xi is the set of all instances in T) that are 
labeled with class Ci. Hence, the key of the naive 

^One should note that in text classification we can transform 
any document to a numeric vector with the bag of words repre¬ 
sentation, i.e., a vector of length q, where q is the vocabulary size 
and each element of the vector indicates the relevance of a term 
for describing the content of the document. 


(3.1) 


(3.2) P(C,|xt) 


(3.3) P(Q|xt)^P(xt|Q)P(C',) 


q 

(3.4) P(a|xT) n Pi^T,j\C,)PiC,) 


(3.5) 


pm = ^ 




Bayes classifier lies in the estimation of P(x 7 ’|C'i), 
or more precisely of Y^j^iP{xT,j\Ci). Depending on 
the type of data (e.g., binary, discrete, or real) a 
different distribution may be assumed for computing 
P{xT,j\Ci) (e.g., Bernoulli, Multinomial, or Gaussian, 
respectively). In text classification one of the most 
effective implementations is based in the multinomial 
distribution, when documents are represented by its 
term-frequency representation (i.e., we know for each 
document, the number of times each term from the 
vocabulary occurs) na ITT] . Accordingly, we focus in 
this implementation, this means we assume w.l.o.g.: 
'X.i G (i.e. the representation of a document is a 
vector of frequency values / integers). 

Assuming a multinomial distribution for the model 
we have that the maximum likelihood estimation for the 
term of interest is: 

(3.6) P(xtIQ) ^ n 

i=i 

where fj^T is the value of the attribute in instance xt 
( in text classification fj^T is the frequency of occurrence 
of the term in document T), and 

(3.7) p{xT,j\a)= 

9 + Efc Fk,Ci 

where Fi^Ct i® sum of values of the attribute in 
documents of class Ci. The derivation from Equation 
removes factorial terms that do not affect the final 
decision. For more details we refer the reader to [Miin]. 
In the description above we did not assume a text 
categorization problem because the same results apply 
to any type of (multinomial-distributed) attributes. 
In the following we use text-mining terminology, but 
we emphasize the description is generalizable to other 
problems. 

3.2 Early Naive Bayes In early text classification 
we assume that during training we have full documents, 
therefore, the same training procedure as the standard 
naive Bayes classifier is performed for estimating the 
necessary probabilities The difference comes at infer¬ 
ence time: when classifying a new document we assume 
we read it in sequential order starting from the begin¬ 
ning (i.e. the first word from top to bottom and from 
left to right). W.l.o.gH, at time t we assume we have 

^One may also train naive Bayes with partial documents, 
however, in that case the probability estimates associated to the 
model are not reliable because they are obtained from reduced 
documents. In preliminary experiments we corroborated this fact. 

®One should note that we can take steps of any length, instead 
of processing word-by-word. 


read the first t—terms in the document (i.e., one word 
is read at each time). Let dx denote the document we 
want to classify, where it contains Md^ words, then, 
dr = wi,W2, ■. 

We notice from Equations (13.5113.71) that in fact we 
can make predictions for document dx regardless the 
amount of information we have read from it: at time 
t we know that dx = therefore, we can 

generate a bag-of-words xx representation for dx as 
follows Xt = (xt.i, ... ,XT,g), where xxj indicates the 
frequency of occurrence of the term in document dx 
(i.e., a tf weighting scheme). Terms not occurring the 
dx or not seen so far at time t are assigned values of 
xxj = 0. With this representation we can use Equation 
(13.31) directly to classify the document. Actually, we 
can attempt to classify document dx without having 
read any information! (i.e., with t = 0), of course 
the probability will be dominated by the priors, see 
Equation (|3.5I) . Simply as this, we can use naive Bayes 
to perform early classification. 

We now briefly analyze what are the main compo¬ 
nents in play when making predictions early. At time t 
one can rewrite Equation (13.41) as: 

(3.8)P(Ci|xT) ~P(Ci) n P(^T,j\Ci) n Pi^T,k\Ci) 
j-.jGdT k:k^dT 

the second product (over j & dx) accounts for the terms 
appearing in the document (probabilities are affected 
by the frequency of occurrence of such terms in dx so 
far); the third product (on k ^ dx) simply reduces to 1 
(because of the exponent in Equation (13.61) 1. Therefore, 
for small values of t, the priors dominate the decision, 
as t increases the content of the document will dominate 
the other products. Therefore, the way these three 
components are estimated can be crucial for improving 
the performance of naive Bayes in early classihcation. 

Despite the simplicity of this early text classifica¬ 
tion approach, we will see in the next section that it 
compares favorably with a more complicated solution 
from the state of the art. We show its validity in a vari¬ 
ety of problems. This paper motivates further work on 
extending this model for early text classihcation. For 
instance, one can dehne/modify adaptive priors that 
change as the value of t increases; we can implement the 
same idea with methods that take into account term- 
dependencies (see e.g., [6l[T7l[20]) in order to increase 
the predictive power of the classiher; also one can adopt 
advanced/alternative smoothing techniques to account 
for partial and missing information properly [IIllIT]; 
as well as many other possibilities. The main goal of this 
paper is to show that naive Bayes can be used for early 
text classihcation and that its performance is compet¬ 
itive with the single existing solution to this problem. 
We foresee our work will pave the way for development 





of a new type of models. 

4 Experiments and results 

For experimentation we considered the data sets de¬ 
scribed in Table [T] We considered three standard the¬ 
matic text categorization tasks (also used in i) and a 
data set for sexual predator detection [S] . All of the data 
and our code will be made available under request for fu¬ 
ture comparisons. In the subsections below we provide 
details on each data set and report the corresponding 
experimental results obtained with them. 


Text categorization 

Data set 

Classes 

Terms 

Red.V. 

Train 

Test 

Reuters-8 

8 

23583 

2483 

5339 

2333 

20-Newsgroup 

20 

61188 

6894 

11269 

7505 

WebKB 

4 

7770 

3727 

2458 

1709 

Sexual predator detection 

SPD 

2 

155886 

6770 

6588 

15329 


Table 1: Data sets considered for experimentation. Red. 
V. is the number of terms when a reduced vocabulary 
is used. 

Text data sets were processed as follows: stop 
words were removed, then stemming was applied, next 
the bag-of-words representation was obtained using the 
TMG toolbox, a term-frequency (tf) weighting scheme 
was used [^. All of the data were processed in 
Matlab^. For most experiments we used reduced vo¬ 
cabularies, that is, we used only a subset of the most 
frequent words/terms (see column 4 in Table[T]), we pro¬ 
ceeded like this for efficiency, nevertheless we also report 
results with full-vocabularies in text categorization data 
sets. 

In addition to the comparison to the state of the art, 
we considered a linear SVM classifier as baseline, since 
this is a mandatory baseline in text classification |101 
[H. SVM was used in early classification similarly as 
the naive Bayes model: it was trained with complete 
documents, and for making predictions, the bag of 
words of a document up to time t is obtained and feeded 
to the SVM classifier. In preliminary experimentation 
we compared SVM with tf and tfidf weighting schemes, 
we report the performance of SVM with the latter 
scheme because we obtained better results with this 
configuration. 

In all of our experiments we report the performance 
of the early text classifiers when varying the percent¬ 
age of the words in test documents (same procedure as 
in a)- Macro-average fi measure was used for multi¬ 
class text categorization problems and fi of the minority 
class (i.e., predators) for the sexual predator detection 
data set. Ideally, the performance of a good early text 
classifier should draw a curve close to the y — axis (see 
hgures below): i.e., better performance with less infor¬ 


mation. A different problem, not evaluated in this pa¬ 
per, is that of triggering a prediction whenever the clas- 
siher is sure about the class of a document. Please note, 
however, that simple triggering mechanisms can be de¬ 
rived for our proposed formulation, e.g., after seeing a 
predefined number of words, or when the difference be¬ 
tween the most probable and the second most probable 
class exceeds a threshold, and so on. 

4.1 Early text categorization First we analyze the 
performance of early naive Bayes on thematic text 
classification. The first three data sets from Table [T] 
were considered, these are widely used benchmark data 
sets for text categorization; standard training/testing 
partition^ were used. Results of this experiment are 
shown in Figure [T] 


20Newsgroup 




Figure 1: Early text classification on standard data sets. 
It can be seen in the top plot that the early naive 

“^As reported in: http://web.ist.utl.pt/acardoso/datasets/ 


















































Bayes (ENB hereafter) classifier outperforms consider¬ 
ably the SVM baseline for the 20Newsgrup data set. 
For both methods, the performance increased monotoni- 
cally and, as expected, better performance was obtained 
when more information is considered. 

The middle and bottom plots in Figure [T] show 
results for Reuters 8 and WebKB, respectively; in these 
plots we show the performance of both methods, FNB 
and SVM, and when using all of the vocabulary {full) 
and a reduced one (for 20Newsgrup data set we were 
not able to run an experiment with the full vocabulary 
in reasonable times). Regardless of the vocabulary 
used, FNB outperforms SVM. However, using the full 
vocabulary had opposed effects in the two data sets. 
In Reuters 8, using the whole vocabulary reduced the 
performance of both methods mainly when using less 
than 50% of information; in WebKB the performance of 
ENB is virtually the same, but the performance of SVM 
increased when using the full vocabulary. This can be 
due to the specific characteristics of the data. Finally, 
in the three data sets it is somewhat evident that the 
predictive performance of ENB presents low variations 
after processing about 50% of the texts. 

4.2 Comparison with related work In this sec¬ 
tion we compare the performance of naive Bayes with 
the MDP introduced in [4] using the same data sets 
from the previous section. For this comparison we repli¬ 
cated the experiment reported by the authors of [4]. 
For each of the data sets, we used different percent¬ 
ages, {1%, 5%, 10%, 30%, 50%, 90%}, of documents for 
the training set and the remainder for the test set (this 
was not our choice, but the setting proposed by the au¬ 
thors of the reference paper). Five runs were performed, 
in each run the documents for training were randomly 
chosen. Average results are shown in Figure [H The re¬ 
sults of ENB are shown as graphs, whereas for the ref¬ 
erence method we report the single-best reported result 
(shown as markers, one per training set size). Please 
note that in [4] the authors optimized the parameters 
of their method, called STC, whereas we have used de¬ 
fault implementation/parameters for ENB. 

From Figure[2l it can be seen that the percentage of 
training documents used for learning the model affects 
considerably the performance of ENB. In all three cases, 
using less than 30% of the samples for training results 
in low performance. This can be due to the fact 
that with small amounts of training documents, the 
estimated probabilities are not very representative of 
the classification task (and so, it is not convenient to 
estimate probabilities from partial information only). 
The best results were obtained when using 50% or 
90% of instances for training the model. Also we can 
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Figure 2: Comparison of ENB and the reference method 
STC. 


notice that the performance stabilizes after 40% of the 
information has been processed. 

When comparing the ENB approach with the se¬ 
quential text classification technique (STC) from [1], it 
can be seen that the MDP from the reference work and 
our ENB perform very similar (even when we only show 
best/optimized results for STC). This is a very interest¬ 
ing result: we obtained comparable performance to a 
more complex model, with a much more simpler and 
efficient technique. 

4.3 Sexual predator detection We now evaluate 
the performance of ENB on the task of sexual predator 








































detection. We used the development / test partitions 
of the data set used in the sexual predator competition 
from PAN’12 [9], see Table [TJ This corpus contains a 
large number of chat conversations, some of which in¬ 
clude a sexual predator trying to approach a chilcS. The 
problem approached in the original competition was to 
identify sexual predators from many chat conversations. 
However, in this work, we approach the problem of de¬ 
tecting conversations with potential sexual predators in 
it. We proceeded in this way because the original task 
was one of forensic analysis: detect predators offline us¬ 
ing all of the conversations in which they were involved 
(see m for our solution that obtained the best result in 
that challenge). Our ultimate goal, on the other hand, 
is to detect, as early as possible, conversations in which 
a sexual predator is involved, in such a way that sexual- 
attacks can be prevented and an alert for parents/police 
ofhcers can be emitted. Based on our previous results 
from [HI, and on the literature on non-thematic text 
classification we decided to represent chat conversations 
with 3-grams of characters (i.e., terms in this data set 
are sequences of 3-letters extracted from the training 
corpus); with this data set we used a reduced vocabu¬ 
lary and preprocessing processes described in [1^. As 
suggested in [9], for this experiment we report /i mea¬ 
sure on the minority class (i.e., predators). Results of 
this experiment are shown in Figure [3] 


Sexual predator detection 



Figure 3: Early classification performance on detection 
of sexual predators. 

On the one hand, we can see that this is a very dif¬ 
ficult task, the performance of both models, SVM and 
ENB, is somewhat low, even when the whole informa¬ 
tion from documents is used (the highest performance is 
lower than 70% of /i measure). This is not a surprising 
result if we notice that this problem is highly imbal¬ 
anced: the imbalance ratio for training and test parti¬ 
tions is of 12.1 and 9.56, respectively. Furthermore, the 
reduction of the vocabulary may affect significantly this 
particular domain (the jargon used in chat conversations 

^Police officers acted as children, predators are real. 


is quite diverse and rich). Despite the difficulty of the 
problem, we can see that again the ENB method out¬ 
performs the SVM model in most cases. Results shown 
in this section make evident the need of better methods 
for early text classification. 

5 Conclusions 

We described the use of naive Bayes for early text classi¬ 
fication. A minor modification to naive Bayes allows us 
to make predictions using partial information. We show 
the effectiveness of this simple approach in three types 
of problems and compare its performance with the only 
existing state-of-the-art method. Our method compares 
favorably in terms of both effectiveness and earliness 
performance with the reference method, a much more 
complex model. Also, our method consistently outper¬ 
formed an SVM baseline. Furthermore, we are the first 
in approaching the early classification of chat conversa¬ 
tions for detecting sexual predators. Although results 
are encouraging, there is too much work to do yet. We 
foresee our work will pave the way for the development 
of more elaborated techniques based on naive Bayes for 
early classification. 

The following conclusions can be drawn from our 
work: 

• Naive Bayes proved to be very effective for early 
text classification, obtaining comparable results to 
state of the art. The inference complexity of naive 
Bayes is negligible (adding the value of g—terms, 
for A"—times), thus makes this method preferable 
over the MDP introduced in [J]. 

• Naive Bayes is a promising solution to the early 
classification problem. Competitive performance 
was obtained with a somewhat straight implemen¬ 
tation, better results are expected with improved 
versions of the classifier. 

• It is possible to anticipate the detection of sexual 
predators, being naive Bayes a potential solution 
to this problem. 

Future work is vast, for instance, exploiting research 
advances in extensions of naive Bayes (see Section [2]) for 
early text classification. Also, it is very important to 
develop spotting mechanisms that can be combined with 
the early naive Bayes technique. Finally, theoretical 
analyses of the problem and the proposed method are 
very much needed. 
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